Joint recognition of text and layout in historical Russian documents

نویسندگان

چکیده

In this paper, we evaluated the Document Attention Network (DAN), first end-to-end segmentation-free architecture on Historical Russian Documents. The DAN model jointly recognizes both text and layout from whole documents, it takes documents any size as an input output well logical tokens. For comparison purposes, conduct our experiments Digital Peter dataset has been recognized at line-level. Dataset consists of Great manuscripts; ground truths are represented according to a sophisticated XML schema which enables accurate detailed definition regions. We achieved good results page-level: 18.71 % for Character Error Rate (CER), 39.7 Word (WER), 14.11 Layout Ordering (LOER), 66.67 mean Average Precision (mAP).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handwritten Text Recognition for Historical Documents

The amount of digitized legacy documents has been rising dramatically over the last years due mainly to the increasing number of on-line digital libraries publishing this kind of documents. The vast majority of them remain waiting to be transcribed into a textual electronic format (such as ASCII or PDF) that would provide historians and other researchers new ways of indexing, consulting and que...

متن کامل

Text Line Extraction from Complex Layout Documents

There are numerous stylish documents which do not have the traditional text layouts where printed text regions are not parallel to each other. Such complex layouts make text line extraction challenging due to multi-orientation of paragraphs. This paper introduces a system for the text line extraction from the complex layout documents. Proposed method is based on the concept of dilation and hist...

متن کامل

ideological and cultural orientations in translation of narrative text: the case of hajji baba of isfahan

در میان عواملی که ممکن است ذهن مترجم را هنگام ترجمه تحت تأثیر قرار دهند، می توان به مقوله انتقال ایدئولوژی از طریق متن یا گفتمان اشاره کرد. هدف از این تحقیق تجزیه و تحلیل جنبه های ایدئولوژیکی و فرهنگی متن مبدأ انگلیسی نوشته جیمز موریه تحت عنوان سرگذشت حاجی بابای اصفهانی ( 1823) و ترجمه فارسی میرزا حبیب اصفهانی(1880) بوده است.

Supervised Text Region Identification on Historical Documents

We present multi-column text region identification support for Ocular, the unsupervised historical printed document transcription project of Berg-Kirkpatrick et. al (2013). We use structured prediction with rich features defined on the input document and incorporate a transition model based on prior document layout assumptions. Our model is trained using a structured-SVM objective on a randomly...

متن کامل

Text-image alignment for historical handwritten documents

We describe our work on text-image alignment in context of building a historical document retrieval system. We aim at aligning images of words in handwritten lines with their text transcriptions. The images of handwritten lines are automatically segmented from the scanned pages of historical documents and then manually transcribed. To train automatic routines to detect words in an image of hand...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Nau?no-tehni?eskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki

سال: 2023

ISSN: ['2226-1494', '2500-0373']

DOI: https://doi.org/10.17586/2226-1494-2023-23-3-585-594